A simple solution, double pipeline L+3

Min is very dubious on this, given the qs can be bonded in 2 cycles, getting a 7th input for optimal quicksilver latency is gonna be needed.